The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

نویسندگان

Richard Zhang

Phillip Isola

Alexei A. Efros

Eli Shechtman

Oliver Wang

چکیده

While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called “perceptual losses”? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations. 1. Motivation The ability to compare data items is perhaps the most fundamental operation underlying all of computing. In many areas of computer science it does not pose much difficulty: one can use Hamming distance to compare binary patterns, edit distance to compare text files, Euclidean distance to compare vectors, etc. The unique challenge of computer vision is that even this seemingly simple task of comparing visual patterns remains a wide-open problem. Not only are visual patterns very high-dimensional and highly correlated, but, the very notion of visual similarity is often subjective, aiming to mimic human visual perception. For instance, in image compression, the goal is for the compressed image to be indistinguishable from the original by a human observer, irrespective of the fact that their pixel representations might be very different. Classic per-pixel measures, such as the `2 Euclidean distance metric, commonly used for regression problems, or the related Peak Signal-to-Noise Ratio (PSNR), are insufficient for assessing structured outputs such as images, since they assume each output pixel is conditionally independent of all others, given the input. A well-known example is that blurring an image causes large perceptual but small Euclidean change. What we would really like is a “perceptual distance”, which measures how similar are two images in a way that coincides with human judgment. This problem, often 1 ar X iv :1 80 1. 03 92 4v 1 [ cs .C V ] 1 1 Ja n 20 18 Original Perturbed Patches (a) Traditional Original Perturbed Patches

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The effect of bilateral subthalamic nucleus deep brain stimulation (STN-DBS) on the acoustic and prosodic features in patients with Parkinson’s disease: A study protocol for the first trial on Iranian patients

Background: The effect of subthalamic nucleus deep brain stimulation (STN-DBS) on the voice features in Parkinson’s disease (PD) is controversial. No study has evaluated the voice features of PD underwent STN-DBS by the acoustic, perceptual, and patient-based assessments comprehensively. Furthermore, there is no study to investigate prosodic features before and after DBS in PD. The curren...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

The Effectiveness of Rebound Therapy on Improving Perceptual Visual Coordination and Social Development of Students with Learning Disabilities

Objective: The purpose of this study was to investigate the effectiveness of rebound therapy on improving perceptual visual coordination and social development of students with learning disabilities. Method: The method of this research is applied. To do this, the semi-experimental research method was used using pre-test and post-test with the control group. The statistical society consisted of ...

متن کامل

Comparison of Effectiveness of Motor-Working Memory Training and Perceptual-Motor Exercises on Digit Span and Letter–Number Sequencing in Educable Children with Intellectual Disabilities

Background and Objective: Appropriate programs should be provided to improve the function of memory, learning, and the effects of processing efficiency in the daily life of children with intellectual disabilities. Therefore, the present study aimed to compare the effectiveness of motor-working memory training and perceptual-motor exercises on digit span and letter-number sequencing in educable ...

متن کامل

Image authentication using LBP-based perceptual image hashing

Feature extraction is a main step in all perceptual image hashing schemes in which robust features will led to better results in perceptual robustness. Simplicity, discriminative power, computational efficiency and robustness to illumination changes are counted as distinguished properties of Local Binary Pattern features. In this paper, we investigate the use of local binary patterns for percep...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1801.03924 شماره

صفحات -

تاریخ انتشار 2018

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

نویسندگان

چکیده

منابع مشابه

The effect of bilateral subthalamic nucleus deep brain stimulation (STN-DBS) on the acoustic and prosodic features in patients with Parkinson’s disease: A study protocol for the first trial on Iranian patients

Speech Emotion Recognition Using Scalogram Based Deep Structure

The Effectiveness of Rebound Therapy on Improving Perceptual Visual Coordination and Social Development of Students with Learning Disabilities

Comparison of Effectiveness of Motor-Working Memory Training and Perceptual-Motor Exercises on Digit Span and Letter–Number Sequencing in Educable Children with Intellectual Disabilities

Image authentication using LBP-based perceptual image hashing

عنوان ژورنال:

اشتراک گذاری